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Abstract 

The Silicon Micro- strip Tracker (SMT) at the D0 experiment in the Fermilab Tevatron collider has been operating since 
2001. In 2006, an additional layer, referred to as 'Layer 0', was installed to improve impact parameter resolution and 
compensate for detector degradation due to radiation damage to the original innermost SMT layer. The SMT detector 
provides valuable tracking and vertexing information for the experiment. This contribution will highlight aspects of the 
long term operation of the SMT, including the impact of the silicon readout test- stand. Due to the full integration of the 
test- stand into the D0 trigger framework, this test- stand provides an advantageous tool for training of new experts and 
studying subtle effects in the SMT while minimizing impact on the global data acquisition. 

© 2011 Published by Elsevier BV. Selection and/or peer-review under responsibility of the organizing committee for 
TIPP 2011. 
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1. Introduction 

The Run II D0 detector has been operating since 2001. It consists of two central tracking detectors 
inside a 2 T solenoidal magnet; central and forward preshower systems; liquid argon calorimeters; and 
muon spectrometers including a 1.8 T toroidal magnet. The Silicon Micro-strip Tracker (SMT) is part of 
the central tracking system of the D0 detector and is the innermost layer of instrumentation JTI]. Thus 
radiation damage is a potential issue and needs to be monitored and addressed [1]. The SMT layout is 
shown in Figure [T] It consists of six barrels each with four-layers. These barrels are interspersed with six 
disks of small radius, so-called 'F-disks'. There are another six 'F-disks' beyond the end of the barrels. 
Two (originally four) large radius detectors, so-called 'H-disks', are located at the ends of the detector to 
enhance tracking at very large pseudo-rapidities |7/| < 3. The barrels provide tracking for particles with 
high transverse momentum in the central regions \ri\ < 1.5, while the disk detectors allow for the precise 
reconstruction of particles traveling with pseudo-rapidity up to l^/l < 3. A major SMT upgrade took place 
in 2006 to install an innermost layer (Layer 0) | 3]. This single-layer detector consists of eight barrels and 
was installed to mitigate the degradation of the first layer of the original SMT due to radiation damage. One 
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Fig. 1. The upgraded SMT detector consists of 6 Barrels, 12 'F-disks' and 2 'H-disks'. Barrels are interspersed with 'F-disk'. 
Additional 'F-disks' and 'H-disks' are placed at both ends of the detector. 



'H-disk' was removed from each end of the detector in 2006 to accommodate the Layer readout channels. 
There are two different types of readout chips used for the SMT: SVX-IIe |4] for the original SMT and SVX4 
readout chips |5] for Layer 0. The SVX-IIe readout chips are mounted on so-called HDIs (High Density 
Interconnects), made from Kapton flex circuits laminated to Beryllium substrates. The silicon sensors are 
glued to them and are referred to as 'module' f7]. There are 432 such modules for the barrel, 288 for the 
'F-disks' and 96 for the 'H-disks', for a total of 5712 SVX-IIe chips installed. For LO the HDIs are ceramic 
hybrids made of Beryllium oxide. Including Layer there are a total of 730k readout channels providing 
the largest data flow of all D0 sub-detectors. 



2. Long-term Operational Experience 



The SMT has generally operated 24 hours per day 7 days a week since 2001, which required dedicated 
shift personnel and experts. In general, the operation is very stable and the response and recovery from 
problems is usually quick. Shift personnel are supported by on-call experts. Furthermore senior experts 
are available for support in various 
aspects. Over time many tools have 
been developed to monitor low and 
high level information such as volt- 
ages of power supplies or on-line 
cluster charge and size histograms 
as well as on-line track efficiencies. 
The hardware and readout chain of 
the SMT as sketched in Figure [2l 
is distributed over several physical 
locations. These locations are not 
entirely accessible on a daily ba- 
sis: the 'Horse shoe' and 'Cathedral' 
area only during longer shutdowns 
whereas the 'Platform' area can be 
accessed between stores with agree- 
ment of Tevatron operations. 
A failure in the hardware and read- 
out chain needs to be understood and 
then traced down to its physical lo- 
cation with the monitoring informa- 
tion at hand. In order to do that it 
is very important to have monitoring 
capabilities for low and high level information. This includes the monitoring of voltages and current draws 
of the power supplies (PS) of the detector. For example a failure of a power supply at the 'Platform' area typ- 
ically causes lower efficiency for approximately a couple of hours until the failure is addressed. On average 




Fig. 2. The hardware and readout chain of the original SMT from the sensor-HDI 
level to the movable counting house level (MCH) via horse shoe, cathedral and 
platform level. 
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all power supply failures including the ones in the cathedral area compromised about 2.5% of the collected 
data. The electrical crew developed a robotic remote switch 'R2D0' to switch to a spare PS within minutes 
for the 'Platform' PS failures. All SMT power supplies at the 'Platform' area have since been equipped with 
'R2D0' units and resulting data quality losses are minimized. 

3. SMT test-stand activities 

The SMT test- stand provides a small 'copy' of the full hardware and readout chain of the SMT. In 
contrast to the SMT, all parts are easily accessible allowing for detailed studies of single components. Figure 



Enabled HDIs versus time (September 29, 201 1 ) 




Fig. 4. The graph shows the fraction of enabled HDIs for Barrel, 
Fig. 3. The picture shows part of the SMT test-stand setup: sen- 'F-disk' and 'H-disk' sensors versus time. The shaded yellow 
sor and a light emitting diode (lower left), PS (upper left), op- bands reflect the shutdown periods of the D0 experiment. For a 
tional signal delay generator with respect to the Tevatron clock more detailed explanation of the steps in the fraction of enabled 
(right). HDIs see text. 

[3]shows parts of the test-stand setup for signal-to-noise studies with HV supply (top left), sensor with a light 
emitting diode (bottom left) and optional signal delay generator with respect to the Tevatron clock (right). 
In general spare boards are tested at the test-stand before they are installed. This is also true during the 
longer shutdowns where problematic boards are replaced. During these shutdowns every effort is made 
to improve the system stability and efficiency. For a complex system like the SMT it is difficult to cite a 
single quantity characterizing global performance. A good measure is the fraction of enabled HDIs as a 
function of time as shown in Figure IH Prior to the 2009 shutdown there was a gradual decrease of this 
fraction due to hardware issues during continued operation. There are many different types of failures and 
the most common ones are individual chip failures as well as bad cable connections at various levels as 
given in Table [T] (larger steps in the fraction of enabled HDIs are explained later in the text). In order to 



Type of HDI failure 


2009 /# HDI 


2010 /# HDI 


Adapter card 


16 


1 


Clock cables 


9 


10 


Interface Board 


15 




Reseating cables 


^6 


2 


Bad/dead (problem inside detector) 


20 


Not re-visited 


Disabling bad chips 


24 


4 


Total # HDI worked on 


90 


17 



Table 1. Detailed list of HDI defects for the years 2009 - 2010. 

trace such a failure to its underlying cause every failure is characterized and a record of previously tried 
interventions is maintained. The shutdown periods are highlighted with shaded bands in Figure (U They 
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allowed for the time-consuming task to investigate and fix these individual failures. The eff'orts resulted in 
higher fractions of enabled HDIs after a shutdown. Another example of a sort of failure are broken wire- 
bonds which interrupted the distribution of the digital power lines to the readout chips. As the readout chips 
on an individual HDI are daisy-chained, a single chip failure caused the 'loss' of all subsequent chips of 
a module consisting of up-to 9 chips. The most prominent occasion occurred in late 2006 but it is likely 
that this sort of failure also contributed to the losses of enabled HDIs prior to that incident. The test-stand 
facilitated the development of a solution to this problem by using an alternative path to distribute the digital 
power using a special hardware board (new adapter card). Thus the initial failing chip was bypassed and the 
readout of the remaining chips on the module could be fully restored as implemented during the shutdown 
in 2007, which increased the fraction of enabled HDIs by about 10%. Furthermore the test- stand facilitated 
the development of an improved sequencer firmware version as well as a modified version of the adapter 
card in order to fix a noise problem. Both have been installed during the shutdown in 2008 and increased the 
fraction of enabled HDIs. An intensive and thorough investigation for all known sorts of failures took place 
during the shutdown in 2009 and resulted in the largest fraction of enabled HDIs. The tireless eff'orts during 
the past shutdowns allowed re-enabling of HDIs and led to an all-time high number of enabled HDIs. 
The test-stand was also used for detailed firmware studies in order to improve signal-to-noise (S/N) for 
the sensors controlled and readout by the SVX-IIe type of chips. The pedestal distribution for the 'old' 
firmware and the 'new' (improved) firmware is shown in Figure [5^) and b). By moving certain activities 
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Fig. 5. Pedestal distribution for 6 chips with 128 channels per chip. First three chips are p-side and last three chips are n-side. The 
pedestal distribution for the 'Old Firmware' is shown on the left, whereas the one for the 'New Firmware' is shown on the right. 

on control Hnes to a different point in time a significant reduction of the noise level was achieved. The 
biggest impact in terms of reducing the noise was achieved by moving control signals for 'PreAmp' -reset 
and 'RampReference'- select further away from the start of digitization. For n-side type of sensors the noise 
was reduced by approximately 20% whereas for the p-side type sensors noise level was stable. The noise 
source is not coupling in the same way to all channels as it can be seen in Figure [5^). Our interpretation is 
that the control signal pulse generates noise on the chip. The previously persistent second band structure is 
now completely removed as it can be seen by comparing Figure [5^) and b). This firmware is now used for 
the entire SMT. 

The D0 data acquisition (DAQ) is a buff'ered system and consequently the dead-time or front-end busy rate 
(FEB) is driven by the amount of data and the ability to process it. Figure [6] shows a simplified sketch of the 
data flow in the D0 experiment. Data are processed by means of a multi-level trigger system (LI, L2, L3). 
The red arrows indicate a busy signal at difl'erent levels if no free bufl'er is available. 

Individual SMT crates showed a very peculiar FEB pattern: one would expect that the SMT crate leading 
in FEB is given by highest data processing load as it takes more time to process more data. Instead the 
FEB leading SMT crate seems to appear randomly as shown in Figure |7^). It shows FEB rates [%] of all 
SMT crates (difl'erent colors) as a function of time with the two crates showing the peculiar FEB pattern 
highlighted by the red circles. This happened on an apparently random basis but more frequently at higher 
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Fig. 6. Simplified sketch of the data flow from left to right. The red arrows indicate a busy signal at difl'erent levels if no free bufl'er is 
available. 



trigger rates. The buffer handling is organized by a VME read-out board controller (VRBC) (S] which 
controls the VME read-out boards (VRB) |6], which in turn are gathering the data from the sequencer 
level as sketched in Figure O The VRBC firmware was extended with monitoring capabilities for buff'er 
management. The two plots in Figure |7]3) show the available buff'ers (blue), buff'ers waiting for L2 (green) 




Fig. 7. a) shows FEB rates [%] of all SMT crates (difl'erent colors) as a function of time without the improved bufl'er handling firmware. 
The two plots in b) show the available buff'ers (blue), buff'ers waiting for L2 (green) or L3 decisions (red) as a function of time. As an 
example the buff'er distribution is shown for the two SMT crates which exhibit an increased FEB rate (top plot, crate 0x65 & 0x67) as 
highlighted by the red ellipses and arrows, c) shows the FEB rates [%] of various detector subsystem crates grouped by colors (SMT 
crates are colored in red). In addition the global D0 LI busy rate (green) consisting of all LI subsystems is shown. More details in the 
text. 



or L3 decisions (red) as a function of time. The buff'er distribution is shown for the two SMT crates which 
exhibit the increased FEB rate (SMT crates 0x65 and 0x67) as highlighted by the red ellipses and arrows. 
A good correlation between the number of available buff'ers and the FEB was seen. In general there are less 
available buff'ers at higher trigger rates. The red circles connected by arrows in Figure |7t)-b) highlight the 
peculiar FEB pattern shown by two diff'erent SMT crates. This eff'ect was due to the sudden reduction of 
available buff'ers (blue) causing increased dead-times for the aff'ected SMT crates. Figure|7]:) shows the FEB 
rates [%] of various detector subsystem crates (SMT crates are colored in red). In addition the global D0 LI 
busy rate consisting of all LI subsystems is plotted (green). The yellow ellipses highlight an increase of the 
global LI busy rate caused by raised FEB rates of particular SMT crates. This illustrates how the sudden 
reduction of available buff'ers in SMT crates aff'ected the global LI busy rate. At higher trigger rates (around 
-50 minutes) the eff'ect is not large. There is an increase of the global LI busy rate by approximately 2% 
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at the same time as the jump in FEB for a SMT crate: from 10% to approximately 12%. At lower trigger 
rates (around -5 minutes) the effect is smaller and the global LI busy rate increases only by about 0.4%. 
The latter can be understood as the reduced number of buffers has largest impact at high data taking rates. 
The SMT test-stand al- 
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lowed tests at high rates 
of new versions of the 
VRBC firmware handling 
buffer management. A 
more robust VRBC firmware 
version was developed and 
it did not show this prob- 
lem anymore. Monitor- 
ing data for this modi- 
fied VRBC firmware ver- 
sion are shown in Fig- 
ure [8t)-b). a) shows the 
FEB rates [%] of all SMT 
crates (different colors) as 
a function of time with 
the improved buffer han- 
dling firmware. There are 
no SMT crates showing a 
significantly higher FEB 
rate. The increased FEB 
rates visible at the end of 
the distribution was due to 
a change in prescale set- 
tings, which increased the 

event rates. Figure[8t)) shows the available buffers (blue), buffers waiting for L2 (green) or L3 decisions (red) 
as a function of time for two different SMT crates. There are no sudden drops in the number of available 
buffers anymore. 
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Fig. 8. a) shows the FEB rates [%] of all SMT crates (different colors) as a function of time 
with the improved buffer handling firmware. The two plots in b) show again available buffers 
(blue), buffers waiting for L2 (green) or L3 decisions (red) as an example for two different 
SMT crates. More details in the text. 



4. Conclusions 

The SMT has been operated since 2001. Its performance and efficiency have been enhanced using new 
tools such as the 'R2D0' units. The SMT test- stand is a unique piece of equipment to train new experts 
as well as to reproduce and understand subtle effects in the SMT while minimizing impact on global data 
taking. Three examples have been presented: HDI recovery effort, optimization of signal-to-noise and the 
buffer management problem. In each case the results from the test-stand led to improved performance for 
the entire SMT system. The training of new experts at the test-stand allowed for new insights into the oper- 
ation of the SMT, which in turn increased the stability and performance of the SMT. 

The SMT detector is performing very well, providing good tracking and vertexing capabilities for the D0 ex- 
periment, which is vital for high efficiency b-tagging and electron/photon identification. 
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